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Abstract 

The Internet, as a global system of interconnected networks, carries an extensive array of infor¬ 
mation resources and services. Key requirements include good quality-of-service and protection of 
the infrastructure from nefarious activity (e.g. distributed denial of service—DDoS—attacks). Network 
monitoring is essential to network engineering, capacity planning and prevention / mitigation of threats. 

We develop an open source architecture, AMON (All-packet MONitor), for online monitoring and 
analysis of multi-gigabit network streams. It leverages the high-performance packet monitor PF RING 
and is readily deployable on commodity hardware. AMON examines all packets, partitions traffic into 
sub-streams by using rapid hashing and computes certain real-time data products. The resulting data 
structures provide views of the intensity and connectivity structure of network traffic at the time- 
scale of routing. The proposed integrated framework includes modules for the identification of heavy- 
hitters as well as for visualization and statistical detection at the time-of-onset of high impact events 
such as DDoS. This allows operators to quickly visualize and diagnose attacks, and limit offline and 
time-consuming post-mortem analysis. We demonstrate our system in the context of real-world attack 
incidents, and validate it against state-of-the-art alternatives. AMON has been deployed and is currently 
processing lOGbps-r live Internet traffic at Merit Network. It is extensible and allows the addition of 
further statistical and filtering modules for real-time forensics. 

Index Terms 

Network monitoring, detection, identification, visualization, PF RING, gigabit streams, commodity 
hardware, data products, algorithms, statistics, heavy tails, extreme value distribution, network attacks. 

I. Introduction 

Motivation and background: The Internet has become a vital resource to business, governments 
and society, worldwide. It has thrived and grown under diverse conditions and technologies with 



little to no centralized regulation. Its fundamental design principles have successfully ensured its 
robustness and broad accessibility. However, these same principles do not provide centralized 
management and/or monitoring of the entire network. Therefore, understanding broad based 
patterns such as traffic loads and thus adequacy of capacity and quality-of-service, composition 
of network traffic, adoption of new protocols and applications, are challenging tasks. Such traffic 
characterization problems and the corresponding network engineering, management and capacity 
planning solutions made necessary the analysis of large volumes of data, and also gave rise to 
statistical techniques to handle them, such as streaming algorithms [|T|, Q, sketches 0-0. 
tomography [[^, Q and analysis of heavy tails and long range dependence Q. 

In addition, it enables numerous vulnerabilities and security threats, both to the infrastructure 
and to its user base. For example, malicious activities such as distributed denial of service 
(DDoS) attacks are relatively easy to implement and rather hard to prevent, since best practices 


like origin IP anti-spoofing (e.g., BCP38 recommendation [I0|) are not universally deployed 
by network operators. Their timely detection at appropriate short time-scales (e.g. in seconds) 
requires processing vast amounts of meta-data (e.g., NetFlow) distributed throughout the entire 
network, thus making it a challenging task. Further, the non-centrally controlled diverse hardware 
and software network infrastructure, allows many additional vulnerabilities open to exploitation 
by adversaries. A recent example features the exploitation of misconfigured NTP (network time 


protocol) servers, that led to one of the largest DDoS attacks ever recorded [ 111. In such reflection 


and amplification attacks [I2|, multiple small requests are sent to several mis-configured NTP 
servers (or other UDP-based services), which inflict transmissions of large data amounts to 
targeted hosts. Thus, the intended victims get overwhelmed with traffic and temporarily disabled. 
Volumetric DDoS attacks are just one possible scenario; low-volume DDoS activities that rely 
on traffic sparseness to avoid detection are also of concern. It is important to be able to defend 
against the DDoS threat model and detect the onset of such potentially unpredictable attacks in 
order to adequately secure the network, e.g., by filtering (blocking) traffic or deploying security 
patches to network gear. 

The key to addressing these diverse topics is the availability of adequate data coupled with 
advanced monitoring and analysis tools and the corresponding software infrastructure. The vast 





volume of network traffie streams makes eolleetion, storage and proeessing of all traffie data 
infeasible. Therefore, the foeus has been on the information available in paeket headers, sueh 
as souree and destination addresses, applieation ports, protoeol, payload size, ete. While sueh 
type of meta-data is more manageable, its rate of oeeurrenee is still very fast. For example, 
storing paeket header information (say initial 96 bytes of an Ethernet frame) from a 10 GE link 
at Meri|^ at a rate of 1.8 Mpps (million paekets per seeond) requires 1.7 GB per 10 seeonds 
(equivalently, around 15 TB per single day). The industry has developed tools sueh as NetElow 
and others (sElow, ete.), whieh effeetively eompress the paeket meta data by grouping them 
into flows. NetElow-alike traffie sampling funetionality is available on many network elements. 
This eompression meehanism, however, ereates an intermediate step, whieh introduees a delay 
in the aeeess to traffie meta-data (in addition to distorting its strueture). 

Even if one has aeeess to raw data on paeket headers or NetElow, its high aequisition rate 
makes online analysis of this information often a formidable ehallenge. Many eonventional 
statistieal methods and algorithms are not sequential in nature and require aeeess to large batehes 
of data spanning several minutes to hours. Thus, possible DDoS attaeks or ehanges in the 
network traffie patterns will be deteeted with offline analysis several minutes after their onset. 
The time-seales of sueh analyses are not desirable, if the goal is to prevent large-seale network 
outages. Note that speeialized, albeit very expensive, applianees exist (e.g.. Arbor Networks’ 
PeakElow), but in real-world settings are eonfigured to reeeive heavily sampled Elow data (i.e., 
sampling rates of 1:1000 or more). Henee, low-volume or short-term attaeks may elude deteetion. 
Eurther, sueh tools require a priori knowledge of baseline traffie patterns. 

These ehallenges motivate us to develop new software and algorithmie infrastrueture for 
harvesting and monitoring network traffie data at the time-scale of routing (i.e., at wire-speed), 
guided by the following prineiples: (a) examine all paekets at the monitoring host; (b) develop 
memory effieient data struetures and statistieal summaries that ean be eomputed and retained 

'Merit Network, Inc. operates Michigan’s research and education network. It is an Internet service provider that serves a 
population of nearly 1 million users. Merit is the largest IP network in Michigan, and its network includes a wide range of link 
types that include link speeds from T1 through lOOGhps. The network backbone consists of a lOOG fiber ring, which passes 
through the major cities of Michigan, as well as Chicago. 


at the time-scale of routing-, (c) easy to build and deploy using commodity, inexpensive, off- 
the-shelf hardware; (d) the resulting data produets should be available to be eommunieated and 
shared in real-time to centralized monitoring stations for further forensics, and (e) the monitoring 
architecture should allow for interactive filtering in real-time. 

Related Work: Over the past 15 years, many practical tools have been developed for intrusion 
detection. For example. Snort (see, snort.org), Suricata (suricata-ids.org) and Bro 
(bro. org) are popular tools that rely on signature-based methods to examine traffic data for 
known malicious patterns. Nevertheless, recent malware often manage to evade pattern matching 
detection by becoming polymorphic (i.e., existing in various forms via encryption). The proposed 
work aims to complement existing tools by adopting instead a behavioral-based approach. 

There exists a noteworthy amount of literature in the area of statistical, behavior-based 
anomaly detection. Standard techniques that seek ‘change detection’ points in traffic time series 


include exponential smoothing [ 131 or other more general time-series techniques [ 14|, [ 151. More 


recent methods employ wavelet-based tools [16| or subspace reduction methodologies based 
on principal component analysis pT| , |jT^. Such methods lack the capability of identifying 
the actual ‘heavy-hitters’ and, most importantly, suffer from the ‘dimensionality curse’ (i.e., 
having multi-dimensional features to monitor) and/or are inadequate for online realization on 
fast, multi-gigabit streams. Hence, there has been a lot of activity in the theoretical computer 
science community on designing and studying efficient algorithms for data streams that aim to 
alleviate the high dimensionality and high ‘velocity’ constraints. Many summary data structures 
(i.e., sketches) have been developed to address the challenging problems of identification of 


heavy-hitters or frequent items in a stream [19|-[25|, anomaly detection in high-dimensional 
regimes Q, p4| , p6| , p7| , compressed sensing and estimation of frequency moments Q, 


|, community mining p4| , etc. (see p5| and references therein). 

In reality, the mere access to fast data streams involves formidable technical challenges. 
Many of the more sophisticated sketch-based algorithms (e.g., 0 . [[^) tackle the 

multi-dimensional aspect of the problem, but implementing them on multi-gigabit streams is 
rather challenging or often impossible without the use of specialized hardware (e.g., FPGA). 


In addition, few frameworks (e.g., [24|, [261) take a holistic approach to develop methods 
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that address the problems of change detection and identification together. Nevertheless, we 
acknowledge the presence of considerable previous work on the topics of network monitoring, 
troubleshooting and intrusion discovery. At the same time, our open-source platform offers a 
novel extendible framework that couples together the important problems of detection, iden¬ 
tification and visualization of aberrant behavior in multi-gigabit streams. We propose new 
algorithms that are a direct consequence of the data products generated by our framework. 
Furthermore, there are relatively few tools that could allow network engineers to interactively 
examine lOGbps-i- traffic streams on the time scale of routing. This motivates the approach we 
have adopted in this paper, which focuses on leveraging several recent advances in high-speed 
packet capture to provide tools that are easy and inexpensive to deploy, examine every packet 
at the interface, and provide simple statistics of the ‘signal’ that allow network engineers to (a) 
visualize structural aspects of traffic, (b) detect changes in intensity or structure of traffic sub¬ 
flows, (c) potentially filter and zoom-in on anomalous IP address ranges identified automatically 
and (d) identify ranges of exact IP addresses associated with anomalous events. Table |I] highlights 
our contributions and identifies differences with previous literature. 

Major Contributions: Our first contribution is the design and implementation of a software 
monitoring framework, referred to as AMON (All-packet MONitor), working reliably over lOGbps-i- 


links. This framework is based on PF RING [36| in zero-copy mode which efficiently delivers 
packets to the monitoring application by bypassing the OS kernel. We implement hash-based 
traffic summaries, which simply randomly assign source and destination pairs to bins providing 
an aggregate but essentially instantaneous picture of the traffic, which can be used to diagnose 


^As noted in 27 , it is limited to relatively low rates in an attempt to not overload the device and affect forwarding actions. 










(a) High-level architecture of AMON. (b) Software performance. 

Fig. 1: The proposed framework. Left: AMON’s data products comprise the input of identification, statistical detection and 
visualization modules introduced in this paper. Right: Performance at rates exceeding 20Gbps; minute drop rates recorded. 


and visualize changes in structure and intensity. 

Our second contribution is a suite of statistical tools for automatic detection of significant 
changes in the structure and intensity of traffic. The accompanying instances of Boyer-Moore 


majority vote algorithm p7| | can be leveraged to identify precise IP addresses associated with at¬ 
tacks; this is an important contribution as well. We illustrate the new data acquisition framework 
with several views of the resulting data structures (hash-binned arrays), referred as ‘databricks’. 
We show (with real data from Merit!) that even basic visualization tools and algorithms applied 
to the right type of data can help instantaneously identify distributed attacks, which do not 


contribute to large traffic volume (see ‘SSDP’ and ‘Tor’ case studies. Section |V-D| ). 

This paper is organized as follows: Section |I^ introduces our monitoring architecture, including 
the data products (Figure and our software prototype; Section |II^ introduces AMON’s iden¬ 
tification component; Section discusses our statistical methodology for automatic detection. 
Section |V] evaluates our software and algorithms on a rich set of real-world Internet data, 
including/our real DDoS case studies, and compares against successful and robust state-of-the- 


art methods for detection and identification [19|, [26|. 


II. Data and software infrastructure 

An overview of the proposed architecture is portrayed in Figure The monitoring application 
is installed on a machine that receives raw packets in a streaming fashion. In our prototype 
at Merit Network, the monitoring probe receives traffic via a passive traffic mirror (using a 
SPAN—switched port analyzer—setting) configured on a network switch. Packets are then 








































































efficiently delivered at 10Gbps+ rates to the monitoring module via PF RING ZC. Subsequently, 
all packets are processed in a streaming fashion for constructing, via efficient hashing, a data 
matrix (i.e., the databrick depicted at Figure]^ and a separate matrix containing the most active 
source-destination flows identified via our extension of the Boyer-Moore algorithm (Section [nil). 
Periodically, these data products are shipped to a database for storage, further analysis and 
dashboard-based visualizations. These data are analyzed through various detection algorithms 


described in Section IV Flows flagged by the detection module can be extracted for further 
analysis by the corresponding filtering mode that is currently under development. 

A. Data products via pseudo-random hash functions 

Internet traffic monitored at a network interface can be viewed as a stream of items (a;„, Vn)-, n = 
1,2,... (see e.g. Q). The a;„ e are the keys and Vn are the updates (e.g., payload) of the 
stream signal. For example, the set of keys could be all IPv4 addresses (kl = {0,1}^^); IPv6 
addresses; pairs of source-destination IP addresses (f2 = {0,1}®^); may include source and 
destination ports, etc; while payloads could be bytes, packets, distinct ports, etc. Since it is not 
feasible to store and manipulate the entire signal when monitoring lOGbps-i- links, we employ 
hashing to compress the domain of the incoming stream keys into a smaller set. Collisions are 
allowed and, in fact, expected, but the hash function is chosen so that it spreads out the set of 
observed keys approximately uniformly. 

Consider for example the set of IPv4 addresses {0,and let h : {0,—)■ {1,... ,m} 


be a hash function (see p5| , p8| , p9| |) that uniformly spreads the addresses over the interval 
{!,... ,m}. Upon observing key {s^d) of the source and destination of a packet, we compute 
the hashes i := h{d) and j := h{s) and update the data matrix X = as X{i,j) : = 

X{i,j) V. This matrix constitutes the first data output of our architecture and is depicted in 
Fig. [^ It is emitted at periodic intervals (e.g., 1 or 10 seconds) to a centralized database for 
online as well as further downstream analysis, and reinitialized. The row- and column-sums of 
this matrix yield the destination- and source-indexed hash-binned arrays, also depicted on the 
figure. These data products are used as inputs for the detection and visualization algorithms 
described below. 







Fig. 2: Our data products. These data arrays, generated online by our PF RING-based software, are used as the basic input 
structures for our detection algorithms. Left: The ‘databrick’ matrix during the ‘Library’ attack (see Section 0; the apparent 
horizontal stripe (at dest ‘bin’ 82) signifies traffic from multiple sources to a single destination (victim). Middle: View of sources 
array, constructed by a matrix column-sum. Right: Destinations array; observe that ‘bin’ 82 stands out (notice the log scale). 


B. Software implementation: PF RING-based Monitoring 

The AMON monitoring application is powered by PF RING, a high performance packet 
capture network socket. Modem hardware advances in CPU speeds and architecture, memory 
bandwidth and I/O buses have shifted the bottlenecks in multi-gigabit packet reception into the 
software stack p0| , [411. PF RING avoids unnecessary memory copies between the operating 
system layers, and hence the length of the packet journey between the network interface (NIC) 
and the monitoring application is shortened. Consequently, the number of CPU cycles spent for 
transferring packets from their NIC entry point to the application is significantly reduced. This 


leads to optimal memory bandwidth utilization [40|, [411, and therefore to extremely efficient 
packet processing speeds (see Figure [Tb]). 

Our system takes advantage of the zero-copy framework that PF RING offers. In this mode, 
the monitoring application reads packets directly from the network interface, i.e., both the 
OS kernel and the PF RING module are bypassed. As a result, efficient monitoring is now 
achievable with commodity, off-the-shelf hardware. For example, all experiments in this study 
were conducted using NIC cards costing below 800 USD. Although alternative fast packet 


processing schemes exist [41 j, PF RING was selected due to its robustness, proved efficiency 
and broad versatility. 

III. Identification: the hash-thinned Boyer-Moore algorithm 


The proposed architecture periodically emits a list of heavy activity stream elements that 
can be used for traffic engineering purposes, accounting and security forensics. When an alert 

































is raised, operators ean readily examine these ‘heavy-hitters’. Our identifieation algorithm is 


based on the Boyer-Moore (BM) majority vote algorithm [37|, and the idea of stream thinning 
for ereating sub-streams deseribed below. The so-named MJRTY Boyer-Moore algorithm p7| 
ean identify exactly the majority element—the element whose volume is at least 50% of the 
total—in a stream, if one exists. It solves the problem in time linear in the length of the input 
sequenee and constant memory. We first define the identifieation problem at hand. 


Problem Addressed 1. (Identification of Heavy-hitters) Given an input stream (a;„, Vn), 
identify the top-i^ most frequent items. The frequeney of key oj is the sum of its updates v. 


Next, we deseribe the original Boyer-Moore algorithm with an analogy to the one-dimensional 
random walk on the line of non-negative integers. A variable count is initialized to 0 (i.e., the 
origin) and a eandidate variable cand is reserved for use. Onee a new key arrives, we eheek to 
see if count is 0. If it is, that IP is set to be the new eandidate cand and we move count one 
step-up, i.e. count = 1. Otherwise, if the IP is the same as cand, then cand remains unehanged 
and count is ineremented, and, if not, count moves one step-down (deeremented). We then 
proeeed to the next IP and repeat the proeedure. Provably, when all IPs are read, cand will hold 
the one with majority, if majority exists. 

Our extension of the MJRTY Boyer-Moore method applied to eaeh sub-stream is outlined in 
Figure It returns up to m ‘heavy-hitter’ items present in a stream of keys, taking values in 
{cji,..., un}, by ‘thinning’ the original stream S into m sub-streams. In the update operation, 
upon observing a new stream item we eompute the sub-stream index s := using 

hash funetion hi. In essenee, we run m independent realizations of the Boyer-Moore algorithm 
deseribed above, one for eaeh sub-stream. Arrays count and cand hold the algorithm state 
for all sub-streams, i.e., cand[s\ holds the majority eandidate for s. Array count is updated 
aeeordingly with the value of v as lines 9, 11 and 16 depiet. The auxiliary flag for eaeh sub¬ 
stream s ean help traek whether a majority is indeed underlying into that sub-stream; at the start 
of the monitoring period the flag eorresponding to s is set, and as long as count[s] remains 
non-negative (i.e., cand[s] needs no updates), the flag never resets. A flag that remains ‘on’ 
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(a) Initialization 

10: 

if count[s] < 0 then 



11: 

cand[s] = oj, count[s] = -count[s] # reset cand 



12: 

flag[s] = 0 # reset flag 



13: 

end if 

1 

Pesdd = Pbm[s,j], S e [m] 

14: 


2 

Initialize 0 = 9 

15: 

else 

3 

for 1=1 to K do 

16: 

cand[s] = oj, count[s] = v # reset cand 

4 

Find o = ArgmaXjg[„,]\oPestb1 

17: 

flagls] = 0 # reset flag 

5 

O = O U o #Exclude for next iteration 

18: 


6 

Output uj^ =cand[o] 

19: 

end if 

7 

Output Pest [o] 

20: 

end if 

8 

end for 

21: 

end if 


(b) Query operation (c) Update operation for stream item {uj,v) 

Fig. 3: Identification algorithm: Hash-thinned MJRTY Boyer-Moore. 


guarantees the presence of a majority itemj^ An estimate of the frequency of each hitter can be 
obtained via the m x m! data structure Phm- Through the use of an independent hash function 
h 2 , this sketch structure keeps a hash array of size m' for each sub-stream s, and gets updated 
with the arrival of each stream element. 

When a query operation is performed (see Figure [^, we retrieve the m candidates. An 
estimation of the volume of a candidate ‘hot’ item s is recovered by looking at the maximum 
value of sub-array Phm[s\- The m candidates are ranked according to these estimates, and an 
approximation of the set of top-iT hitters is identified. 

IV. Statistical Methods for Anomaly Detection 

This section introduces three new detection methods. We start with a data exploration and 

model validation discussion that characterizes our data products; all methods leverage AMON’s 
data. The first method is based on estimating the number of ‘heavy hitters ’ at each time point; 


^As an example, during the real-time 5-day experiment with the Chicago traffic (Figure 


lb I, an average fraction of at least 


85.41% of all sub-streams (with 0.11% standard deviation) contained a majority element (for bytes). 






this estimate represents a monitoring statistic that one ean traek, and time points with heavy 
hitter aetivity ean be flagged as anomalies. Next, a method derived by modeling the distribution 
of the relative volume of the heaviest bins in the souree and destination, 1-dimensional, hash- 
binned arrays is introdueed. This seetion eoneludes with a teehnique for diseovering structural 
changes in traffie, a method ehiefly suitable for seemingly innoeuous, low volume aberrant 
behavior. Heneeforth, the problem of anomaly deteetion is formulated as follows. 


Problem Addressed 2. (Detection) Given an input traffie stream, find the time points when 
the baseline probability distribution of an appropriate monitoring statistie seems inadequate. 


A. Statistics, model validation and data exploration 

The sueeessful deteetion of statistieally signifieant anomalies in the derived hash-binned traffie 
arrays depends on the adequaey of the model employed. We undertook an extensive empirieal 
analysis of long hash-binned arrays and found that heavy-tails are ubiquitous. Figure (left 
panel, top figure) shows a time series of the linearized hash-binned array of outgoing (Souree) 
traffie at Merit Network for the period 17:30-18:30 EST on July 22, 2015. Observe the eonsistent 
presenee of extreme peaks in the data, some of whieh may in faet be due to an attaek event 
(see the ‘Library’ ease study, Seetion |V]). By zooming-in on a short (seemingly ealm) 3-minute 
period—bottom right—we observe that the extreme peaks, although of lower magnitude, persist. 

Heavy-tailed power law distributions are suitable statistieal models for data exhibiting sueh 
eharaeteristies. Power laws are ubiquitous in eomputer network traffie measurements. It is 
well known and doeumented that file-sizes, web-pages, Ethernet traffie, ete. exhibit power-law 


tails [421, [43|. Speeifieally, let X = Xt{i) denote, for example, the amount of traffie registered 
in a given hash-array bin i. Then, a parsimonious model for its tail is as follows: 

P(X > x) ~ c/x", as X —>■ oo, (1) 


where means that the ratio of the left- to the right-hand side eonverges to 1 and where a > 0 
and c > 0 are eonstants. The smaller the exponent a, the heavier the tail of the distribution, 
and the greater the frequeney of extreme values. In partieular, if a < 2, then the varianee of X 
does not exist and if a < 1, then the mean of this model is infinite. 
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Fig. 4: Left panel: Time-series of Source hash-binned arrays (Top) and its zoomed-in version (Bottom-right), computed over 
10-second windows. The max-spectrum of the entire time series is plotted on the bottom-left. Merit Network: 17:30-18:30 EST, 
July 22, 2015. Right panel: Merit Network 16:00-17:00 EST, Aug 1, 2015 - the ‘Tor’ event in Section [V-D| (Top-left) Ingress 
connectivity for the top N = 3000 hash-binned flows per 10-second windows over 1-hour. (Top-right) QQ-plots demonstrating 
accuracy of the Normal approximation of typical in-degree distributions. (Bottom plots) QQ-plots for anomalous bins. 


Figure 1^ (left panel, bottom) shows the max-spectrum of the entire 1-hour long time series 
of hash-binned source traffic array. The max-spectrum is a plot of the mean log-block-maxima 
versus the log-block-sizes of the data. A linear trend indicates the presence of power-law tails 
as in o. while the slope provides a consistent estimate of 1/a. Thus, steeper max-spectra 
correspond to lower values of a and heavier tailed distributions generating more extreme values. 
A useful feature of the max-spectrum plot is its ability to examine various log-block-sizes 
(scales), thus enabling simultaneous examination of the power-law behavior in the data at various 


time-scales [44|. As it can be seen, the power-law behavior (linearity in the spectrum) extends 
over a wide range of time-scales from seconds to hours. The time-scale relevant to our studies 
is a few seconds, which yields estimates of a ~ 1.6, obtained by fitting a line over the range 
of scales (log 2 -block-sizes) 1 through 6. Over intermediate time-scales (a few minutes) the 
exponent a raises to about 2.5. The simple power-law models are no longer sufficient to capture 
the distribution over the largest time-scales (hours), where complex intermittent non-stationarity 
and diurnal trends dominate. 

Alternatively, Figure]^ shows the complimentary cumulative distribution function x h->■ P(X > 
















































































Source and (negative) Destination Hash Arrays 



Local Tail Exponents 



Fig. 5 : Left: Complementary CDF u i—P(X > u) on log-scale for Source and Destination traffic hash-arrays. Right: Time 
series and tail exponents of Source and Destination traffic hash-array computed over 10-second windows. Merit Network: July 
22, 2015, 17:30-18:30 EST. 


x) on a log-scale for both Source and Destination traffic arrays. Linear scaling on this plot 
corresponds to power-law behavior as in ([T]) and the slope of the linear fit yields an estimate of 
—a. Even though one cannot clearly talk about time-scales here, similarly to the max-spectrum 
plot, one sees two regimes of power-law scaling. The heavy-tail behavior is relatively more 
severe for the range of smaller values corresponding, on average, to shorter time-scales. Our 
focus is on very short time scales of a few seconds to a minute. Our analysis shows that over 
such time-scales, the power-law model captures the essence of the distribution. Figure shows 
tail exponents for both Source and Destination hash arrays Xt{i), z = 1,... ,m as a function 
of time t. Observe the persistent heavy-tailed nature of the data throughout the entire period of 
time. Note that the Source (outgoing) traffic is slightly heavier-tailed (lower exponents a) than 
the Destination (incoming). Note also that the estimators of the tail exponent are rather robust to 
large-volume fluctuations, e.g. in the Destination time-series. This is another important feature 
of the max-spectrum, which will play a role in the successful detection of such anomalous event, 
described in the following sections. 



























B. Detection of Heavy Hitters 

In this section, we describe a methodology (named as ‘Frechet method’) for aberrant behavior 
discovery, based on monitoring the number of hash-bins involving heavy traffic, henceforth 
referred as heavy hitters. The precise definition of a heavy hitter is rather subtle; depending on 
the traffic context, a given flow (e.g. video transmission) may be perceived as a heavy hitter 
in light traffic conditions, while it may be in fact a ‘typical’ event in normal traffic conditions. 
Here, we adopt a statistical perspective, where we flag hash-bins as heavy hitters, if their signal 
exceeds a given quantile of a baseline probability distribution, i.e., we view heavy hitters as 
‘outliers’. In order to be adaptive to changing traffic conditions, we shall dynamically and 
robustly estimate the baseline probability model from the data. Our notion of heavy hitters 
depends on the probability associated with the quantile threshold. This tuning parameter may 
be set depending on how sensitive we would like to be to ‘alarms’. 

Let Xt = t = 1,2, be sequence of hash-binned traffic arrays. In the case 

of the source IP signal, for example, Xt{i) corresponds to the column-sum of the databrick 
matrix and represents the number of bytes originating from all source IPs oj hashed to bin i, 
i.e. h{uj) = i over the time-window t. Figure (middle) shows an example of such array. The 
process of hashing effectively randomizes traffic flows in different bins, and therefore, the entries 
Xt{i), i = 1,... ,m may be reasonably assumed to be statistically independent and identically 
distributed (i.i.d.). 

Our goal is to identify and flag the presence of abnormally large (heavy) traffic. One way 
that this manifests itself is through abnormally large values of Xt{i)s, for some i’s. To this end, 
we consider the sample maximum of the hash-array: 

DmiXt) := max Xt{i). (2) 

We shall identify a bin i e {1,... ,m} as a heavy hitter, if its value is large, relative to an 
asymptotic approximation to the distribution of the sample maximum. 


Proposition 1. Let X{i), i = 1,... ,m be i.i.d. random variables with heavy tails as in ©• 
Then, as m ^ oo, we have that 


1 

mfloi 


Dm{X) 


1 


max X{i) 

2=1,...,m 



a/a V 
^a’) 


(3) 




where P(Zq, < x) = e has the standard a-Frechet distribution and c is the asymptotic 
parameter in <0- 


This result is a simple eonsequenee of Theorem 3.3.7, p. 131 in [451. For eompleteness, its 
proof is given in the Appendix. 

Relation Q suggests that for relatively large values of m, we ean use the limit a-Freehet 
distribution to ealibrate the deteetion of heavy hitters. Speeifieally, given a sensitivity level po 
(e.g. equal to 0.95 or 0.99), we flag the bin i as a heavy hitter, if 

c \ 


Xtii) > Tp^{m,a,c) := = 


(4) 


'log(l/po)) 

where = (log(l/p))“^/“, p G (0,1) is the inverse of the standard a-Freehet eumulative 

distribution funetion {x) = e a: > 0. This way, in praetiee, under normal traffie 

eonditions, the probability of flagging any bin in time-slot t as a heavy hitter is no greater 
than (1 — po). The rate of potential false alarms, may be eontrolled and redueed by judieiously 
inereasing the level po- On the other hand, the presenee of abnormally large bins relative to the 
referenee distribution will be flagged if their values exeeed the threshold a,c). 

To be able to use this methodology, one should estimate the key parameters a and c appearing 
in formula Q. The reeently proposed max-speetrum method in [44 j is partieularly well-suited to 
this task. It is easy to tune, robust to outliers, eomputationally effieient, and it provides estimates 
of both the seale parameter c and the tail exponent a. This methodology is summarized in the 
formal algorithm (Algorithm [^. 


Remark 1. The hash-array is obtained from the PF RING-based methodology at the time seale 
of one array per several seeonds. For the traffie eonditions in the Merit Network (e.g. rates of 
lOGbps), we found that time-windows of 10 seeonds provide suffieiently well-populated bins 
that lend themselves to reasonable heavy-hitter deteetion. In this setting, we output estimates of 
heavy hitters every 10 seeonds. For greater traffie rates, hash-binned arrays are populated faster 
and our methodology ean be applied at an even shorter, sub-seeond time-seale. 


Remark 2. Proposition is an asymptotie result. In our experiments with real traffie data, we 
found the approximation based on the Freehet distribution to be reasonably aeeurate for m as 
low as 128 and durations about 10 seeonds. 





Algorithm 1: Frechet method 

Input: Stream of hash-arrays Xt = 
probability level po £ (0,1); 
smoothing coefficient A £ (0, 1). 

Output: Stream of significant heavy-hitter bins 
'Ht C {1,..., m} and their counts kt = 

1: for each stream item Xt do 

2: Estimate the tail exponent a := a{Xt) and scale 

coefficient c := c{Xt) from the sample 
Xt = {Xt{i)}YLi based on the max-spectrum. 

3: if {t = 1) then 

4: Set at := a and ct := c 

5: else 

6: Perform EWMA smoothing: 

at := Aa -I- (1 — A)Q;t-i and 
Ct := Ac -I- (1 — A)ct-i. 

7: end if 

8: Compute the significance threshold 

Tt := Tp„ (m, at, Ct) using 0- 
9: Estimate the set of heavy hitter 

bins Ht at window t as 

Ht := [i G : Xt(i) >Tt|. 

10: return Ht and kt := \Ht\. 

11: end for 


Algorithm 2: Relative volume 

Input: Stream of hash-arrays Xt = {Xt{i)}'^i; probability 
level Po £ (0,1); candidate value fc £ m} 

(preferably <C m)\ smoothing parameter A £ (0,1). 

Output: Binary stream of alarm-flags ft £ {0,1}. 

1: for each stream item Xt do 

2: Estimate the tail exponent a := a{Xt) from the 

sample Xt = {Xtii)}7Li. 

3: if (f = 1) then 

4: Set at := a 

5: else 

6: Perform EWMA smoothing: 

at ■■= Aci -I- (1 — Alo-t-i. 

7: end if 

8: Compute the relative volume of of the top-fe bins 

Vt{k) as in 0. 

9: Using Monte Carlo simulations, compute numerically 

the significance threshold qt = qt{po', k, at, m), such 
that 

P(Waj {k, m) < qt) « Po- 

10: return ft := I{Vt{k) > qt], i.e., flag Vt{k) as 

significantly large (at level po) if Vt(fc) > qt- 

11: end for 


C. Detection via Relative Volume 


Alternatively, one ean deteet high-impaet events by monitoring the volume of the top-hitters 
relative to the total traffic. As before, suppose that Xt = is a hash-binned array of 

traffic volume (in bytes) computed over a given time window. 

Sort the bins in decreasing order, so that Xt{ii) > ■ ■ ■ > Xt{ik) > ■ ■ ■ > Xt{im) > 0. Fix a 
k G {1,..., m} and consider the relative volume of traffic contributed by the top-k bins: 


Vt{k) : = 




(5) 


Note that the indices of top-k bins can change from one time-window to the next. 

We aim to identify when Vt{k) is ‘significantly’ large. For example, if /c = 1, one would like 
to know if the top bin suddenly carries a very large proportion of the traffic relative to the rest. 









This could indicate an anomaly in the network. As in the previous seetion, we will measure 
signifieanee relative to a baseline probability model, whieh is dynamieally estimated from the 
data. The ubiquitous heavy-tailed nature of the Xt(i)’s will play a key role. 

Let now X{i), i = 1,... ,m he i.i.d. non-negative random variables representing a generie 
hash-binned traffie array. As argued in the previous seetion, in a wide range of regimes, the 
distribution of the X(i)’s is heavy tailed, and they may be assumed independent beeause of the 
pseudo-randomization due to hashing. Thus, as in ([T]), we shall assume that F{x) = 1 — F(x) = 
P(X(1) > x) ~ c/x", for some c > 0 and a > 0. It is well-known that if the distribution funetion 
F is eontinuous, then U{i) := F(X{i)), i = 1,... ,m are i.i.d. Uniform(0,1). Therefore, the 


Renyi representation for the joint distribution of the order statistics (p. 189 in [451), implies 


/ k=l \ \i rn+1 ^ ^ k=l 

This yields the following result about the distribution of the relative volume. 


( 6 ) 


Proposition 2. (i) Under the above assumptions, we have 


{V{k]m), fc = l,...,m} = 


(r,/r„+i) 




k = 1, 


, m 


(7) 


(ii) Under a. for fixed 1 < k < 1, we have, as m ^ oo, 

V[k] m 


...... 




Wo.{k,i) := 


spe Y 
Z^ 7 = l J- 


— Ija ' 


( 8 ) 


The proof is given in the Appendix. Reeall that our goal is to test whether V(k; m) is 
signifieantly large. The asymptotie result in Q suggests that the distribution of the statistie 
Wa{k,i) ean be used as a baseline model. Note however that it quantifies the magnitude of 
V{k, m) relative to V(£, m) for some fixed 1. In praetiee, in the eontext of network traffie 
hash-binned arrays we studied, it turns out that V(£, m) ^ 1 for moderately large values of 
£. Therefore, the denominator in the left-hand side of @ ean be taken as 1. Further, to be 
slightly eonservative, one ean take £ = m. We therefore obtain the distributional approximation 


V[k] m) 


Wo,{k,m) := 


•1/a 


E 


j = l 3 


1/a ■ 


Note that this approximation is in faet valid exaetly, if 


F(x) = c/x“, X > i.e. under the Pareto model. This diseussion leads to Algorithm]^ 








Remark 3. In scenarios where the Pareto approximation is not as aecurate, one can adapt 
the above algorithm by eonsidering i < m and test the eontribution of ratios of volumes 
V{k-,m)/V{i;m), relative to the baseline distribution of Wa{k,i). As indieated above, for 
simplicity, and to be slightly eonservative in praetiee, we use i = m, whieh worked rather well. 


The signifieanee threshold qt in Algorithm may fluctuate substantially in time, sinee the 
tail exponent at does (see, e.g. Figure 5 in [|46|). This natural adaptivity property allows us 
to dynamically calibrate to the ehanging statistical properties of the stream. It is important, 
however, to be also robust to sudden ehanges of regime due to the onset of anomalies, i.e., we 
should not adapt to the anomalies we are trying to deteet. Sueh robustness ean be aehieved and 
tuned by the smoothing parameter A. The smaller the value of A, the closer the at to past values 
as, s <t. Some degree of smoothing ean also improve estimation aceuraey through borrowing 
strength from the past. In praetiee, we found that A ~ 0.5 works well in our eonditions. 

If the type of anomalies considered persist over several windows of time A, one ean substan¬ 
tially deerease the false alarm rate by considering control charts. This leads to a slight modifiea- 
tion of Algorithm]^ Following [471, one ean eonsider the p-values, pt := F{Vt{k) > Wat '^))^ 
and then apply an EWMA on the z-scores: Zt := Ap<l)“^(l — pt) + (1 — Xp)zt-i, for another 
weight \p G (0,1). Then, under baseline eonditions, Zt follows the Normal distribution with 
zero mean and varianee = Ap/(2 —Ap). Thus, elassieal proeess eontrol methodology suggests 


to raise an alarm if Zt/az > L, for a given level parameter L > 0 [ 13|. The pair of parameters 
(Ap, L) ean be tuned so as to ensure deteetion of persisting anomalies, while minimizing false 
alarms. Seetion |V] ineludes studious sensitivity analyses of these ealibrations eontrols. 

D. Community Detection 


Consider now the two-dimensional matrix Xt = of updates, obtained for a 

eertain time t. The teehnique proposed next aims at deteeting ehanges in the community structure 
of the network flows. To this end, foeus on the top N bins of Xt{i,j), i, j = 1,... ,m, whieh 
represent an aggregate summary of the top origin-destination flows in the network. 

Let At = {at{i,j))mxm be a binary matrix, sueh that at{i,j) = 1 if and only if bin (z,j) 
belongs to the set of top N items in the array Xt. One ean view At as an adjaeency matrix of 
an oriented graph Gt, whieh is a type of a histogram of the underlying (rather sparse) graph 




of flows from a given sIP to a dIP that are aetive over the time-window of interest. Changes 
in the eonneetivity of Gt indieate ehanges in the eommunity strueture of the traffle flows. For 
example, in the event of a DDoS or other distributed attaeks, a given destination IP ojq is flooded 
with substantial amount of traffle from multiple souree IPs. If a large number of sourees are 
involved, then this will likely result in a horizontal strip of relatively large values in the two- 
dimensional hash-binned array. The loeation of the strip will be zq := hiojo )—the index of the 
bin where the target destination IP ojq is hashed (see, e.g. Figures |7] and for visualizing the 
‘Tor’, ‘SSH-seanning’ and ‘SSDP’ attaek events). 

One way to formally and automatieally deteet sueh features is to foeus on the graph with 
adjaeeney matrix At. In this event, the matrix At will have a relatively larger number of Is in 
the zoth row and, eorrespondingly, the in-degree of node zq will be large. We propose a statistieal 
method for quiekly identifying significant peaks in the in-degrees (or out-degrees). This method, 
eombined with the information from the Boyer-Moore MJRTY instanees assoeiated with the 
bins involved ean lead to an almost instantaneous identifieation of possible targets as well as 
(potential) eulprits of malieious aetivities. 

Foeus on ingress eonneetivity, i.e., let Ifii) := o-tihj), z = 1,..., m be the in-degree 
assoeiated with node z for the oriented graph Gt- Our goal is to flag statistieally signifieant 


peaks of Itii). As argued in Seetions IV-B and IV-C, hashing ensures randomization and henee 
Ifii), z = 1,..., m ean be reasonably assumed to be independent. In eontrast with the previous 
seetions, however, the distribution of the eounts Ifii) are no longer heavy-tailed but rather well- 
approximated by a Normal distribution. For a fixed z, thanks to the randomization indueed by 
hashing, one ean view ai(z,j)’s as independent in j. Henee, for relatively large m, as well 
as N and ultimately traffle rate, the CLT ensures that eentered and normalized integer eounts 
It{i) ean be modeled by the Normal distribution. Indeed, Figure (right panel, top-right plot) 
shows Normal quantile-quantile plots of Ifii), f = 1,..., T for 5 typieal (non-anomalous) bins 
i. The linearity in the plots indieates agreement with the Normal distribution. The heatmap 
therein (top-left) shows the entire array {It{i))mxT of in-degrees eomputed over 10-seeond time 
windows over the duration of 1 hour. We foeused on the top N = 3000 flows. The bottom 
plots in this figure show the QQ-plots eorresponding to anomalous bins with high in-degree 




corresponding to the higher intensity lines in the top-left plot. 

Given the above diseussion, in the baseline regime, we shall assume that hii), i = 1,... ,m 
are independent cr^ )• Then for Dt := maxi=i_... m by the independenee of the /t(i)’s, 
we obtain F{Dt < x) = ^ ’ where is the standard normal CDF. Fix a probability level 

Po (e.g. 0.99), and eonsider the significance threshold Mi(po) = w(po, ''^5 dt-, := dt + <^t^p])^ ■ 
Thus, in the baseline regime, all in-degrees Ifi)-, i = 1,..., m lie below ufpo) with probability 


Po- As in Seetion IV-B we shall flag all bins i, for whieh Ifi) exeeeds ufpo) as anomalous. 
The deteetion algorithm is analogous to Algorithm exeept that now one should estimate the 
parameters pt and at- This ean be similarly done using an EWMA of the empirieal means and 
standard deviations of the samples /*(*), i = 1,..., m. We omit the details to avoid repetition. 

This method is illustrated in Seetion |V| where it is sueeessfully employed in mining seemingly 
harmless events eharaeterized by high node-eonneetivity (e.g., the ‘SSDP’ and ‘Tor’ oases). 
These events are harder to deteot via the methods desoribed in Seotions IIV-BI and IIV-CI 


Remark 4. Observe that the aeeess to various oloud serviees and resouroes ean have similar 
eharaoteristios, where multiple souroe IPs oommunioate with a single destination IP (server). 
Sueh servers, however, are typieally well-known and ean be a priori filtered out. An alternative 
applioation of this methodology is to traek the up-surge of users to a partioular servioe, sueh 
as Twitter, Faeebook or Google, for example. Sueh up-surges, not neeessarily due to malioious 
aetivity, may be of interest to network engineers or researehers. 


V. Performance Evaluation 


A. Software performance 


The exeellent performanee of PE KING is well doeumented [40|, |41|; this seetion foeuses 
instead on our monitoring applioation. We perform measurements in situations where AMON 
is deployed in the field, and under heavy stress-testing with a traffio generator applianoe. 

Eigure [T^ (right) illustrates our software oapabilities when monitoring traffio at Merit’s main 
peering point in Chioago. Our setting involves a passive monitor (i.e., paoket tap) reoeiving traffio 
from four SPAN lOGE ports. The mirrored traffle ineludes both ingress and egress network 
traffio, and a 5-day snapshot of aggregate volume is shown in Eigure [T^ Note that traffio 
rates are well above 20Gbps; however, AMON monitoring traffio from all four lOGE ports 










simultaneously, experienced minimal packet drops (below 1.5%). Further, the amount of physical 
memory required by our application was only around 40MB, something expected from the low 
space complexity of the Boyer-Moore algorithm p9| . 

To shed more light into this, we undertook performance tests using a traffic generator with 
40 byte payload packets (i.e., sending at the minimum frame of 64 bytes). At wire-speed of 
lOGbps we measured throughput that exceeded 12 Mpps (million packets per second). This 
corresponds to a drop rate of 18%; testing with payloads of size 96 and above showed zero loss 
at wire speeds. We conjecture that the bottleneck seems to be the buffer size of the NIC card 
we used, and not PF_RING. In particular, the Intel card we tested has buffers of size 4096, and 
hence packet drops seem to be inevitable at these rates.We are in the process of conducting 
tests on cards with larger buffers in order to verify our hypothesis. 

B. Identification accuracy 


Next, we demonstrate the identification accuracy of MJRTY Boyer-Moore; we perform com¬ 


parisons against Combinatorial Group Testing (CGT) [19|. We utilize an hour-long NetFlow 
dataset, collected at Merit, with 92 million flows and an aggregate volume of 447 GBytes and 
around 580 million packets. Both methods are evaluated against the ground truth (i.e., exact 
recovery of top-iT hitters). All methods report their answers every 100,000 NetFlow records; 
Table illustrates the average proportion of identified heavy hitters among the top-iT and the 
standard error (in parenthesis). The chosen data contain a low-volume DDoS attack attributed 
to the Simple Service Discovery Protocol (SSDP); see Figure (left). 

The CGT method p9| is a probabilistic technique, based on the ideas of ‘group testing’. It 
aims at finding the elements whose volume is at least l/{k + 1) of the total; this is a relaxed 
version of the top-iF hitters problem. The authors provide performance guarantees with respect 
to accuracy, space and time. It is suited for high-speed streaming data; indeed, besides its offline 
evaluation on accuracy, we have implemented the method in the AMON framework and verified 


its time and space efficiency. Its online realization demonstrated results similar to Figure lb 


For the results of Table |I^ we sought the top source IPs per interval. The tuning parameters for 
CGT include the hash-table size W, and the number of groups T (in all experiments, T = 2; 
increasing T improves accuracy but worsens the efficiency on real-data). The granularity unit b 




TABLE II: Identification; comparison with Combinatorial Group Testing (CGT) [19|. 


Top-K hitters 

BM (m=512) 

BM (m=1024) 

CGT (k=500,W=1024) 

CGT (k=1000,W=2048) 

CGT (k=2000,W=4096) 

K=10 (packets) 

0.99(0.04) 

0.99(0.03) 

0.91(0.10) 

0.91(0.10) 

0.89(0.09) 

K=10 (bytes) 

0.99(0.03) 

0.99(0.03) 

0.79(0.14) 

0.85(0.12) 

O 

bo 

o 

K=50 (packets) 

0.94(0.04) 

0.98(0.02) 

0.90(0.07) 

0.90(0.05) 

0.90(0.04) 

K=50 (bytes) 

0.96(0.03) 

0.98(0.02) 

0.80(0.11) 

0.89(0.06) 

0.90(0.05) 

K=100 (packets) 

0.85(0.05) 

0.94(0.03) 

0.60(0.11) 

0.89(0.05) 

0.90(0.03) 

K=100 (bytes) 

0.90(0.03) 

0.96(0.02) 

0.58(0.09) 

0.86(0.09) 

0.92(0.04) 

K=200 (packets) 

0.71(0.05) 

0.87(0.03) 

0.30(0.06) 

0.76(0.12) 

0.91(0.03) 

K=200 (bytes) 

0.77(0.04) 

0.90(0.02) 

0.29(0.05) 

0.48(0.08) 

0.78(0.13) 


(see p9| , Sec. 3.3) is set to 6 = 8 for better efficiency, at the expense of space in memory. 

The MJRTY BM is regulated with the number of sub-streams, m. Note that, regarding space 
utilization, hashing with size m = 1024 corresponds to the CGT case W = 1024. In all cases, 
we used W > 2k, per Lemma 3.3 in [ 191. The results of Table |I^ showcase that MJRTY BM is 
highly accurate in finding the most frequent elements of the stream. It also often outperforms 
its competitor. CGT’s performance can increase with elevated values of W and T at expense 
of space and, most importantly, time. However, MJRTY BM can increase its accuracy too 
by stretching m. Finally, we note that CGT, being a probabilistic algorithm, may output IP 
elements that are not present in the stream (due to hash collisions). Conversely, MJRTY BM is 
not susceptible to this. 

C. Detection accuracy 

We shed light into the detection accuracy of our methods by considering real-world DDoS 
case studies as well as synthetic attacks injected on real data. The studied attacks were recorded 
at Merit’s NetFlow collector. The first event, labeled as ‘Library’ case study, involved heavy 
UDP-based DNS and NTP traffic to an IP registered to a public library in Michigan, and is 
considered a volumetric attack (see Figure [^. The second event, named ‘SSDP’, is a low-volume 
attack directed to another host within the network (see Figure left). 


We implemented the Defeat [26| subspace method and juxtapose its performance against 


our algorithms on the two attacks. Defeat checks for anomalies using principal component 
analysis (PCA). A dictionary of entropies is built, over moving windows, of distributions of 
certain signature signals involving source and destination ports and IP addresses. In Defeat, 







abnormalities are viewed as unusual distributions of these features. The Defeat framework is 
well-suited for multi-dimensional data due to its sketeh-based design, and ean be utilized for 
deteetion, identifieation and elassifioation of attaeks. However, its requirement for eonstruetion 
of empirieal histograms makes it less appealing (if any feasible at all) for online realization. 

Table [in| tabulates our analyses. We report two metries, namely precision and recall. Preeision 
depiets the fraetion of alerts raised that are indeed relevant, and reeall eaptures the ratio of aetual 
anomalies that were deteeted. The ground truth (i.e., instanees that the attaek was ongoing) 
was obtained by offline data analysis that revealed the times when the target IPs and the 
eorresponding protoeol ports ranked among the top-10. Again, we eonsidered time windows 
of 100,000 flow reeords. Due to the faet that additional attacks unknown to us might be present 
in the data, the precision criterion should be interpreted as a worst-case, lower bound. 

The Defeat method was calibrated by the significance level a (necessary for its monitoring 
statistic threshold) and the number of ‘votes’ raised by Defeat's internal detection processes. A 
sketch of size 484 was employed. For our system, we ran all three detection methods and reported 
their results; we also demonstrate the overall AMON accuracy by accounting the union of alerts. 
As illustrated in Table [nl| both Defeat and AMON perform remarkably well on ‘Library’. Recall 
that this event is a voluminous one, and hence both techniques can easily detect it. On the other 
hand, the ‘SSDP’ case is a harder one (see Figure [^. Defeat reports true alerts for a higher 
time fraction, and both methods show consistent and similar false positive rates. The fact that 
Defeat checks for more traffic features than our methods seems to be the explanation of its 
higher attack discovery rate. However, we emphasize that AMON indeed rapidly uncovered the 
underlying event during its period of appearance. Further, it is extensible and adding monitoring 
features like source and destination ports into its design is straightforward. 

The Defeat method works well but is rather sophisticated and requires a training period. This 
training process is computationally intensive and has to be performed offline. Moreover, the 
construction of signal distributions and entropy calculations, required to run the PCA-based de¬ 
tection requires very large memory structures, which do not scale well in real network conditions. 
Making this method work in real-time requires a formidable and independent effort. Further, 
finding a sufficiently long, anomalous-free period that satisfies the stationarity assumption might 


TABLE III: Detection accuracy; comparison with Defeat [26| 


(a) The Library case 

study 


(b) the SISUF case 

study 


Method 

Free. 

Recall 

Method 

Free. 

Recall 

Defeat (q = 0.01) 

0.95 

0.94 

Defeat {a = 0.001, 9 votes) 

0.35 

0.80 

Defeat (a = 0.001) 

0.80 

0.95 

Defeat (a = 0.001, 10 votes) 

0.31 

0.21 

Frechet fp = 0.95, Ac = 0.6) 

0.89 

0.22 

Frechet (p = 0.95, Aa = 0.6) 

0.40 

0.03 

Rel. Vol. (Ap = 0.6, L = 1.64) 

0.80 

0.48 

Rel. Vol. (Ap = 0.6, L = 1.64) 

0.47 

0.12 

Connectivity (p = 0.9999) 

0.74 

0.95 

Connectivity (p = 0.9999) 

0.34 

0.45 

AMON (all methods) 

0.94 

0.93 

AMON (all methods) 

0.33 

0.46 

Frechet (p = 0.85, Ac = 0.6) 

0.65 

0.41 

Frechet (p = 0.85, A^ = 0.6) 

0.36 

0.13 

Rel. Vol. (Ap = 0.6, L = 1.64) 

0.80 

0.48 

Rel. Vol. (Ap = 0.6, L = 1.64) 

0.47 

0.12 

Connectivity (p = 0.9999) 

0.74 

0.95 

Connectivity (p = 0.9999) 

0.34 

0.45 

AMON (all methods) 

0.94 

0.93 

AMON (all methods) 

0.33 

0.47 

Frechet (p = 0.95, \a = 0.6) 

0.89 

0.22 

Frechet (p = 0.95, Xa = 0.6) 

0.40 

0.03 

Rel. Vol. (Ap = 0.6, L = 2) 

0.80 

0.29 

Rel. Vol. (Ap = 0.6, L = 2) 

0.52 

0.06 

Connectivity (p = 0.9999) 

0.74 

0.95 

Connectivity (p = 0.9999) 

0.34 

0.45 

AMON (all methods) 

0.95 

0.93 

AMON (all methods) 

0.34 

0.46 


be challenging. Its adaptability in dynamically changing traffic conditions is another concern. 
In contrast, our methods are highly adaptive to traffic trends and require no training. 

To grasp insights into AMON’s sensitivity to various tuning parameters we can study Table [TVl 
and Table |Vj For this evaluation, we utilized data collected during a seemingly ordinary period. 
We randomly injected attacks of various magnitudes at 5 times; the injected traffic volume occurs 
directly on the databrick matrices. We first considered the scenario of many sources sending 
traffic to one destination. In this scenario, one databrick row is ‘inflated’ by the synthetic attack 
magnitude at 5 random instances. The algorithm input was the destinations hash-binned arrays 
(see Figure [^, and we allowed a grace period of 3 minutes for detection. Each individual 
experiment was repeated 50 times and we report the average performance in terms of precision, 
and recall, sub-script ‘d’ denotes that the algorithm input was the destinations’ signal. 
We also considered the scenario of one source communicating with multiple destinations (see 
and and the case of several sources to various destinations (rightmost four columns). 

Table [TVl tabulates results for Algorithm which is tuned by the significance level po = p and 
the smoothing parameter A = Aq. Best performance is achieved with p = 0.95 and Aq, = 0.50. 
Note that p may be calibrated to ease the false alarm rate. Further, observe the connection 














TABLE IV: Frechet method (Algorithm 1 


p 

Xa 

Gbps 

p(l) 


p{2) 

JR s 


p(3) 

JR s 

R?^ 

p(3) 

Rf 

0.95 

0.50 

0.50 

0.74 

1.00 

1.00 

0.96 

1.00 

1.00 

0.74 

1.00 

0.95 

0.50 

1.50 

0.73 

1.00 

1.00 

1.00 

1.00 

1.00 

0.74 

1.00 

0.95 

0.50 

2.50 

0.73 

1.00 

1.00 

0.98 

1.00 

1.00 

0.74 

1.00 

0.95 

0.60 

0.50 

0.72 

0.93 

1.00 

0.61 

1.00 

1.00 

0.85 

1.00 

0.95 

0.60 

1.50 

0.74 

0.99 

1.00 

0.88 

1.00 

0.99 

0.85 

0.99 

0.95 

0.60 

2.50 

0.73 

1.00 

1.00 

0.92 

1.00 

1.00 

0.85 

1.00 

0.99 

0.50 

0.50 

1.00 

0.73 

0.74 

0.22 

1.00 

1.00 

1.00 

0.99 

0.99 

0.50 

1.50 

1.00 

0.94 

0.98 

0.50 

1.00 

1.00 

1.00 

1.00 

0.99 

0.50 

2.50 

1.00 

0.97 

1.00 

0.71 

1.00 

0.99 

1.00 

0.99 

0.99 

0.60 

0.50 

0.76 

0.24 

0.36 

0.09 

1.00 

1.00 

1.00 

1.00 

0.99 

0.60 

1.50 

0.92 

0.40 

0.08 

0.02 

1.00 

1.00 

1.00 

1.00 

0.99 

0.60 

2.50 

1.00 

0.55 

0.16 

0.04 

1.00 

1.00 

1.00 

0.99 


between robustness and adaptivity as dictated by Aq,. Recall that this parameter is used to 
smooth the heavy-tail exponent, a. Big traffic spikes translate to a heavier distribution tail and 
thus a low a; this could make our scheme too insensitive/conservative if we do not have an 


adaptive scheme that accounts for ‘historical’ a’s. On the other hand, high a can make our 
scheme too sensitive (i.e., many false positives). Table fV] illustrates the detection performance 
of a modification of Algorithmic which utilizes EWMA control charts on z-scores, as explained 


at the end of Section IV-C We employ our methodology for the EWMA (Ap, L) pairs shown 
and A = Aa was fixed to 0.5. For this evaluation, our synthetic attacks were persistent for 5 
consecutive time slots (i.e., 50 seconds). Users can tame the alert rate by increasing the control 
limits with a higher L and/or decrease further Ap. 

In addition, we undertook sensitivity analysis with respect to various choices of monitoring 


intervals used to generate databrick matrices. Table VI illustrates that at the relevant time-scale 


of interest (e.g., few seconds), detection accuracy remains relatively unchanged. For optimal 
performance, very brief aggregation intervals for low-traffic links are discouraged because the 
hash-binned arrays will be sparse. Similarly, very large aggregation levels (e.g., minutes) are also 









TABLE V: Relative volume method 


(Algorithm 


L 

Xp 

Gbps 

p(l) 

R'-P 

p(2) 

JR s 

R?'’ 

pC3) 

JR s 


p{3) 

Rf 

2.00 

0.50 

0.50 

0.45 

0.98 

0.61 

0.97 

0.49 

0.99 

0.35 

0.99 

2.00 

0.50 

1.50 

0.45 

0.99 

0.62 

0.96 

0.41 

0.99 

0.33 

1.00 

2.00 

0.50 

2.50 

0.45 

0.98 

0.64 

0.98 

0.41 

0.99 

0.34 

1.00 

2.00 

0.60 

0.50 

0.61 

0.98 

0.80 

0.96 

0.74 

0.99 

0.45 

1.00 

2.00 

0.60 

1.50 

0.62 

0.97 

0.81 

0.99 

0.50 

0.99 

0.41 

0.99 

2.00 

0.60 

2.50 

0.60 

0.98 

0.81 

0.97 

0.48 

0.99 

0.40 

0.99 

3.00 

0.50 

0.50 

1.00 

0.62 

0.76 

0.25 

0.95 

0.99 

0.78 

0.99 

3.00 

0.50 

1.50 

0.98 

0.74 

0.62 

0.16 

0.73 

0.98 

0.60 

0.98 

3.00 

0.50 

2.50 

1.00 

0.81 

0.90 

0.36 

0.58 

0.98 

0.55 

0.99 

3.00 

0.60 

0.50 

0.96 

0.42 

0.50 

0.13 

0.99 

0.98 

0.92 

0.98 

3.00 

0.60 

1.50 

1.00 

0.59 

0.40 

0.10 

0.79 

1.00 

0.65 

1.00 

3.00 

0.60 

2.50 

1.00 

0.72 

0.56 

0.13 

0.71 

1.00 

0.59 

1.00 



Fig. 6: The ‘Library’ event. The left panel shows the time-series of the hashed-array for destinations for a period of one 
hour. Note the dark horizontal stripe (at bin 82) between minutes 30 and 45 and towards the end of the hour. The adjacent 
panel depicts the aggregate traffic volume over the hour of interest. Observe the elevated traffic volume and note that both the 
Frechet and the relative volume methods correctly raised alerts (red) during the malicious activity period (rightmost figures). 


unwelcome since short-duration attacks might be masked by collisions with other events. Further, 
the heavy-tail modeling assumption is not suitable on large time-scales (see Section IV-A| ). 

We conclude this section be showcasing that AMON’s identification may synergistically be 
coupled with the detection alerts to guide operators’ troubleshooting and mitigation efforts. By 
looking at the database of heavy hitters reported by MJRTY BM, network managers can readily 
have an IP list of candidate culprits. Upon detection, our algorithm outputs a databrick bin, 
aimed to identify culprits. One can then examine the flows reported by Boyer-Moore of the 
sub-streams associated with the relevant hash bin. For example, one first sorts these Boyer- 




















TABLE VI: Sensitivity on aggregation level (‘Library’ ease) 


Aggregation Level 

Frechet method 

(p = 0.85, Ac = 0.6) 

Precision Recall 

Relative Volume 

(Ap = 0.6, L = 1.64) 

Precision Recall 

200K NetFlow records 

0.74 

0.52 

0.82 

0.53 

300K NetFlow records 

0.76 

0.73 

0.87 

0.70 

400K NetFlow records 

0.92 

0.71 

0.81 

0.75 

500K NetFlow records 

0.81 

0.73 

0.70 

0.76 


TABLE VII: ‘Library’ ease study: eulprit identifieation using MJRTY Boyer-Moore. 


Top-K 

1 

2 

4 

8 

16 

32 

64 

128 

Frechet method (Alg. 

- time fraction (%) 

39.7 

60.3 

74.0 

90.4 

93.2 

98.6 

100 

100 

Rel. Vol. (Alg. 

1 - time fraction (%) 

42.7 

61.3 

74.7 

90.7 

93.3 

98.7 

100 

100 


Moore-identified flows based on their traffle volume estimate (Pest in Algorithm [^, and then 
proeeeds with forensies analysis. In partieular, for the ‘Library’ event, in 39.7% of the flagged 
times by Algorithm the top ranked flow by Boyer-Moore was indeed one of the (sre, dst) 
pairs of interest (see Table VII[ top row). Lor a 60.3% fraetion of times, the same deteetion 
method was able to identify flow(s) of interest among the top-two reported BM flows, ete. 
Similar reports are offered by the seeond detection algorithm at the bottom row. 

D. Diagnosing low-volume attacks 


In addition to orchestrated, volumetric attacks that seek to overwhelm the victims with traffic, 
low-volume DDoS attacks can be pernicious, albeit problematic to detect. Such attacks, like the 
‘SSDP’ event previously analyzed, rely on presumably innocuous message transmissions to 
thwart standard anomaly detection methods. In this section, we highlight the importance of 
visualizations and of methods that detect structural patterns (see Section |IV-D[ ) in traffic in 
revealing these low-profile nefarious actions. As an initial example, consider Ligure (bottom 
left). The bold horizontal line in the depicted databrick is an artifact of the distributed nature 
of the ‘SSDP’ attack. Operators can easily, instantaneously and visually observe such patterns 
by monitoring AMON’s live data products. Lurther, we reiterate the important role that the 
connectivity algorithm played in automatically uncovering the ‘SSDP’ instances (see Table llllb] ). 

Ligure depicts another case study of this kind in which sparse traffic patterns (left panel) 
make Algorithms and to miss these events. As clearly seen by the in-degree counters of 
















Figure (middle), two possibly suspicious events are occurring. Manual inspection revealed the 
first event to be UDP misuse affecting a Tor exit router within Merit, and the other (longest 
running) attempts of SSH-breaking into Michigan-located servers from IPs that belong at an 
autonomous system registered in the Asia-Pacific region. We refer to this case study as ‘Tor’. 
The right panel illustrates that both events were flagged by our community detection system. 
This plot shows the number of highly connected destinations (i.e., high in-degree) over the 
duration of an hour. The correct hash bins were also reported (22 and 53). 

Further, Figure (right panels) demonstrates extra visualization aids readily available by 
our data products; cliques and clique sizes for the sources and destinations co-connectivity 
graphs are depicted. A co-connectivity graph for sources provides insights into the number 
of common destinations between two sources; the destinations co-connectivity graph sheds 
similar information for destinations. To obtain these (undirected) graphs we utilize the binary 
matrix At = {at{i,j))mxm (see Section IV-D). The co-connectivity graph for destinations Dt is 
efficiently obtained as Dt := Af Af; the graph for sources is St := Aj ■ At, where Af is the 
transpose of At. Based on the co-connectivity graphs one can obtain visualizations about the 
cliques formed, over time. Figure showcases two such snapshots. These graphs are portrayed 
as their matrix adjacency representations, and we have re-arranged the node labeling based on 
the node-degree in decreasing order (i.e., the first row represents the adjacency associations of 
the node with the highest degree). Note the very large clique formed in src-to-src graph. This 
depicts the situation in the ‘Tor’ case study discussed above, when a plethora of sources were 
contacting the same destination (the Tor exit router). One may also extract the maximum clique 
for each graph; the bottom row demonstrates this characteristic over time. The reader is pointed 
to our supplement [481 for an animated version of Figure where clearly one can observe the 
clique sizes evolving and expanding over the duration of the ‘Tor’ event. 


VI. Conclusions 

The paper presents a novel open source monitoring architecture suitable for multi-gigabit 
(i.e., lOGbps-i-) network traffic streams. It is based on PF RING Zero-Copy and tailored for 
deployment on commodity hardware for troubleshooting high-impact events that may arise from 
malicious actions such as DDoS attacks. It is worth noting that the NIC needed for processing 
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Fig. 1'. Low-volume attacks (‘Tor’ and ‘SSH-scanning’ events): community detection. 
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Fig. 8: Visualizations readily available by our data products. Left: Merit Network 10:00-12:00 EST, Dec 9, 2015 - the 
faint ‘SSDP’ event (in volume) is clearly observable in the time snapshot of the databrick matrix (horizontal line). Top Right: 
Adjacency matrices of co-connectivity graphs (node indices sorted by degree—black corresponds to locations of I’s). Bottom 
Right: Size of max cliques over time during the ‘Tor’ case study (Section [V-D^ . By observing clique size changes in Dashboards 
like this, coupled with the detection method of Section |IV-D| such seemingly innocuous low-volume events are captured. 


traffic data at speeds around 25Gbps (Figure[T^ eosts roughly 800 USD, while speeialized FPGA 
aecelerated eards or monitoring appliances eost an order of magnitude higher (above 10,000 
USD). We demonstrated the performance of our system arehitecture and the underlying statistical 
methods on seleeted real-world ease studies and measurements from the Merit Network. 


Our framework is extensible, and allows for further statistieal, filtering and visualization 
modules. Currently, we are in the proeess of deploying an interactive filtering mode of operation 
that would enable network operators to zoom-in and examine IP ranges of interest in real¬ 
time. As an example, consider the ‘Tor’ and ‘SSH-seanning’ events of Figure for whieh 
our methods automatically flagged bins 22 and 53. With the filtering option, operators ean 
rapidly zoom exelusively into the sub-stream of traffle that gets mapped into the flagged bins. 















































Note that due to randomization, these hash-bins are not assoeiated with traditional IP-ranges 
(e.g., subnets or speeifie IP addresses). Thus, sueh filters eannot be implemented using existing 
filtering infrastruetures sueh as BPF or hardware-based filters. 
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Appendix 


Proof of Proposition!^ This result is a simple eonsequenee of Theorem 3.3.7, p. 131 in [pi5|. 
Proof: By the independenee of the X(i)’s, for all fixed x > 0, we have 


m 


<x)= P(X < = (1 - P(X > m^/“x))”^. 


Now, by ([T]) with x replaeed by m^/"x, we observe that P(X > m^/“x) ~ c/(mx"), as m —)■ cxd. 
Thus, using the faet that (1 — cx“"/m)™ —>■ , m —)■ cx), we eonelude that 

< x) —> ^ as m —)■ oo. 

This implies the desired eonvergenee in Q, sinee P(c^/"Zq, < x) = e X > 0. ■ 

Proof of Proposition 

Proof: Part (i) is a direet eonsequenee of Now, to prove (ii), observe that by 0, 

V{k;m) (J'j/'^m+i) 

V{e-m) ~ 

By the Strong Law of Large Numbers, we have that Tj/Tm+i ~ Pj/m, as m — )■ oo, almost 
surely, for all j = 1,..., i. Reeall that i is fixed. Thus, in view of 0, F '\p) (p/c) as 

p 10, and henee for all j = 1,..., i, with probability one, we have 

~ \ asm^cx). 








This implies that the right-hand side of @ eonverges almost surely to 




W„{kJ), 


whieh eompletes the proof of 


References 

[1] A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, “Surfing wavelets on streams: one-pass summaries for 
approximate aggregate queries,” in Procedings ofVLDB, Rome, Italy, 2001. 

[2] S. Muthukrishnan, “Data streams: Algorithms and applications,” Found. Trends Theor. CS, vol. 1, no. 2, Aug. 2005. 

[3] B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen, “Sketch-based change detection: methods, evaluation, and applications,” 
in 3rd ACM SIGCOMM IMC, NY, USA, 2003, pp. 234-247. 

[4] A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin, “One sketch for all: Fast algorithms for compressed sensing,” 
in STOC ’07, NY, USA, 2007, pp. 237-246. 

[5] S. Stoev, M. Hadjieleftheriou, G. Kollios, and M. Taqqu, “Norm, point, and distance estimation over multiple signals 
using max-stable distributions,” in IEEE 23rd International Conference on Data Engineering, April 2007, pp. 1006-1015. 

[6] B. Xi, G. Michailidis, and V. N. Nair, “Estimating network loss rates using active tomography,” J. Amer. Statist. Assoc., 
vol. 101, no. 476, pp. 1430-1448, 2006. 

[7] E. Lawrence, G. Michailidis, V. N. Nair, and B. Xi, “Network tomography: a review and recent developments,” in Erontiers 
in statistics. London: Imp. Coll. Press, 2006, pp. 345-366. 

[8] S. Stoev, M. Taqqu, C. Park, G. Michailidis, and J. S. Marron, “LASS: a tool for the local analysis of self-similarity,” 
Computational Statistics and Data Analysis, vol. 50, pp. 2447-2471, 2006. 

[9] S. Stoev and G. Michailidis, “On the estimation of the heavy-tail exponent in time series using the max-spectrum,” 
Applied Stochastic Models in Business and Industry, vol. 26, no. 3, pp. 224-253, 2010. 

[10] P. Ferguson and D. Senie, “RFC 2827: Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP 
Source Address Spoofing,” https://tools.ietf.org/html/bcp38 

[11] J. Czyz, M. Kallitsis, M. Gharaibeh, C. Papadopoulos, M. Bailey, and M. Karir, “Taming the 800 pound gorilla: The rise 
and decline of ntp ddos attacks,” in IMC ’14. NY, USA: ACM, 2014, pp. 435-448. 

[12] C. Rossow, “Amplification Hell: Revisiting Network Protocols for DDoS Abuse,” in Proceedings of the 2014 Network 
and Distributed System Security (NDSS) Symposium, February 2014. 

[13] J. M. Lucas and M. S. Saccucci, “Exponentially weighted moving average control schemes: Properties and enhancements,” 
Technometrics, vol. 32, no. 1, pp. 1-29, Jan. 1990. 

[14] G. E. P. Box and G. Jenkins, Time Series Analysis, Forecasting and Control. Holden-Day, Incorporated, 1990. 

[15] J. D. Brutlag, “Aberrant behavior detection in time series for network monitoring,” in Proceedings of the I4th USENIX 
Conference on System Administration, ser. LISA ’00. Berkeley, CA, USA: USENIX Association, 2000, pp. 139-146. 

[16] P. Barford, J. Kline, D. Plonka, and A. Ron, “A signal analysis of network traffic anomalies,” in 2nd ACM SIGCOMM 
Workshop on Internet measurement, NY, USA, 2002, pp. 71-82. 



[17] A. Lakhina, M. Crovella, and C. Diot, “Diagnosing network-wide traffic anomalies” SIGCOMM Comput. Commun. Rev., 
vol. 34, pp. 219-230, August 2004. 

[18] -, “Mining anomalies using traffic feature distributions,” in SIGCOMM '05. NY, USA: ACM, 2005, pp. 217-228. 

[19] G. Cormode and S. Muthukrishnan, “What’s hot and what’s not: Tracking most frequent items dynamically,” ACM Trans. 
Database Syst., vol. 30, no. 1, pp. 249-278, Mar. 2005. 

[20] R. M. Karp, S. Shenker, and C. H. Papadimitriou, “A simple algorithm for finding frequent elements in streams and hags,” 
ACM Trans. Database Syst., vol. 28, no. 1, pp. 51-55, Mar. 2003. 

[21] G. Cormode, F. Kom, S. Muthukrishnan, and D. Srivastava, “Finding hierarchical heavy hitters in data streams,” in VLDB 
03, 2003, pp. 464-475. 

[22] C. Estan and G. Varghese, “New directions in traffic measurement and accounting,” SIGCOMM Comput. Commun. Rev., 
vol. 32, no. 4, pp. 323-336, Aug. 2002. [Online]. Available: http://doi.acm.org/10.1145/964725.633056 

[23] R. Schweller, Z. Li, Y. Chen, Y. Gao, A. Gupta, Y. Zhang, P. Dinda, M.-Y. Kao, and G. Memik, “Reverse hashing for 
high-speed network monitoring: Algorithms, evaluation, and applications,” in INFOCOM 2006, 2006, pp. 1-12. 

[24] Y. Zhang, S. Singh, S. Sen, N. Duffield, and C. Lund, “Online identification of hierarchical heavy hitters: Algorithms, 
evaluation, and applications,” in IMC '04. NY, USA: ACM, 2004, pp. 101-114. 

[25] G. Cormode and S. Muthukrishnan, “An improved data stream summary: the count-min sketch and its applications,” J. 
Algorithms, vol. 55, no. 1, PP- 58-75, Apr. 2005. 

[26] X. Li, F. Bian, M. Crovella, C. Diot, R. Govindan, G. lannaccone, and A. Lakhina, “Detection and identification of 
network anomalies using sketch suhspaces,” in IMC '06, NY, USA, 2006, pp. 147-152. 

[27] D. van der Steeg, R. Hofstede, A. Sperotto, and A. Pras, “Real-time DDoS attack detection for Cisco lOS using NetFlow,” 
in Integrated Network Management (IM), 2015 IFIP/IEEE International Symposium on. May 2015, pp. 912-911. 

[28] A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin, “Algorithmic linear dimension reduction in the norm for 
sparse vectors,” in Allerton 2006, 2006. 

[29] Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan, “Counting distinct elements in a data stream,” in 
RANDOM '02. London, UK, UK: Springer-Verlag, 2002, pp. 1-10. 

[30] N. Alon, Y. Mafias, and M. Szegedy, “The space complexity of approximating the frequency moments,” in 28th Annual 
ACM Symposium on Theory of Computing, ser. STOC ’96. New York, NY, USA: ACM, 1996, pp. 20-29. 

[31] J. Tropp and A. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” Information 
Theory, IEEE Transactions on, vol. 53, no. 12, pp. 4655-4666, Dec 2007. 

[32] D. Donoho, “Compressed sensing,” Information Theory, IEEE Transactions on, vol. 52, no. 4, pp. 1289-1306, April 2006. 

[33] P. Indyk, “Explicit constructions for compressed sensing of sparse signals,” in I9-th ACM-SIAM SODA. PA, USA: 
Society for Industrial and Applied Mathematics, 2008, pp. 30-33. 

[34] G. Cormode and S. Muthukrishnan, “Space efficient mining of multigraph streams,” ser. PODS ’05. New York, NY, 
USA: ACM, 2005, pp. 271-282. 

[35] S. Ranshous, S. Shen, D. Koutra, S. Harenberg, C. Faloutsos, and N. F. Samatova, “Anomaly detection in dynamic 
networks: a survey,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 7, no. 3, pp. 223-247, 2015. 

[36] L. Deri, “Improving passive packet capture: Beyond device polling,” in In Proceedings of SANE 2004, 2004. 



[37] R. Boyer and J. Moore, “MJRTY - a fast majority vote algorithm,” in Automated Reasoning, sen Automated Reasoning 
Series, R. Boyer, Ed., 1991, vol. 1, pp. 105-117. 

[38] J. Carter and M. N. Wegman, “Universal classes of hash functions,” Journal of Computer and System Sciences, 1979. 

[39] M. Kallitsis, S. Stoev, and G. Michailidis, “Hashing Pursuit for Online Identification of Heavy-Hitters in High-Speed 
Network Streams,” July 2014, http://arxiv.org/abs/1412.6148 

[40] F. Fusco and F. Deri, “High speed network traffic analysis with commodity multi-core systems,” in 10th ACM SIGCOMM 
Conference on Internet Measurement, sen IMC ’10. New York, NY, USA: ACM, 2010, pp. 218-224. 

[41] S. Gallenmiiller, P. Emmerich, F. Wohlfart, D. Raumer, and G. Carle, “Comparison of frameworks for high-performance 
packet lO,” in ANCS ’15. Washington, DC, USA: IEEE Computer Society, 2015, pp. 29-38. 

[42] W. Leland, M. Taqqu, W. Willinger, and D. Wilson, “On the self-similar nature of ethemet traffic (extended version),” 
Networking, lEEE/ACM Transactions on, vol. 2, no. 1, pp. 1-15, Feb 1994. 

[43] M. Crovella and A. Bestavros, “Self-similarity in world wide web traffic: evidence and possible causes,” Networking, 
lEEE/ACM Transactions on, vol. 5, no. 6, pp. 835-846, Dec 1997. 

[44] S. Stoev, G. Michailidis, and M. Taqqu, “Estimating heavy-tail exponent through max self-similarity,” IEEE Transactions 
on Information Theory, vol. 57, no. 3, pp. 1615-1636, 2011. 

[45] P. Embrechts, C. Kliippelberg, and T. Mikosch, Modelling Extreme Events. New York: Springer-Verlag, 1997. 

[46] M. Kallitsis, S. Stoev, S. Bhattacharya, and G. Michailidis, “AMON: An Open Source Architecture for Online Monitoring, 
Statistical Analysis and Forensics of Multi-gigabit Streams,” September 2015, http://arxiv.org/abs/1509.00268 

[47] D. Lambert and C. Liu, “Adaptive thresholds: monitoring streams of network counts,” J. Amer. Statist. Assoc., vol. 101, 
no. 473, pp. 78-88, 2006. [Online]. Available: http://dx.doi.Org/10.1198/016214505000000943 

[48] M. Kallitsis, S. Stoev, and G. Michailidis, “This paper (Supp. Material),” September 2015, http://tinyurl.com/pdccpce 


